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^ . Abstract 

^~>' In proofs of L2 -differentiability, Lebesgue den sities o f a central distribution are often assumed 
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right from the beginning. Generalizing Hubeg (Il98ll Theorem 4.2), we show that in the class 



of smooth parametric group models these densities are in fact consequences of a finite Fisher 
vQ \ information of the model, provided a suitable representation of the latter is used. The proof uses 

the notions of absolute continuity in k dimensions and weak differentiability. 
As examples to which this theorem applies, we spell out a number of models including a corre- 
lation model and the general multivariate location and scale model. 

As a consequence of this approach, we show that in the (multivariate) location scale model, 
rl^ ■ finiteness of Fisher information as defined here is in fact equivalent to L2 -differentiability and to 

j^ \ a log-likelihood expansion giving local asymptotic normality of the model. 

Paralleling Huber's proofs for existence and uniqueness of a minimizer of Fisher information to 
our situation, we get existence of a minimizer in any w eakly closed s et ,^ of central distributions 
F . If, additionally to analogue assumptions to those ofi Huben (11981 '). a certain identifiability con- 



dition for the transformation holds, we obtain uniqueness of the minimizer. This identifiability 
condition is satisfied in the multivariate location scale model. 

Keywords: Fisher information, group models, multivariate location and scale model, 
correlation estimation, minimum Fisher information, absolute continuity, weak differentiability, 
LAN, L2 differentiability, smoothness; 
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1. Introduction 



^ ' 1.1. Motivation 



L2 -differentiability as introduced by LeCam and Hajek appears to be the most suitable setup 
in which to derive such key properties as local asymptotic normality (LAN) in local asymptotic 
parametric statistics. In order to show this L2 -differentiability however, Lebesgue densities of a 
central distribution are frequently assumed right from the beginning. In this paper, we generalize 



Huben (119811 Theorem 4.2) from one-dimensional location to a large class of parametric models. 



where these Lebesgue densities are in fact a consequence of a finite Fisher information of the 
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model, pro vided a suitable definition of the latter is used. This definition may then serve — 
again as in iHuberl (11981.) — as starting point for minimizing Fisher information along suitable 



neighborhoods of the model. 

The frame work in which this generalization holds covers smooth parametric group models as to 



be found in lBickel et al.l (11998.) . but is valid even in a somewhat more general setting: The idea 



is to link transformations in the parameter space to transformations in the observation space. 
The new definition of Fisher information then simply amounts to transferring differentiation 
in the parameter space to differentiation — in a weak sense — in the observation space. This is 
actually done much in a Sobolev spirit, working with generalized derivatives. 

1.2. Organization of the Paper 

After an introduction to the setup of smooth parametric group models, in section |2l we list 
the smoothness requirements for the transformations and some notation needed for our theorem. 
Before stating this theorem, in section[3]we first give a number of examples to which this theorem 
applies, the most general of which is the multivariate location and scale model from Example l3.7l 
Section m provides the main result. Theorem 14.41 In section |5j we spell out the resulting Fisher 
information in the examples of section [3] As announced in the motivation, in section |6l culmi- 
nating in Proposition 16.21 we show that in the (multivariate) location-scale model finiteness of 
Fisher information is equivalent to L2 -differentiability as well as to a LAN property. Finally, in 
section |7] we generalize Huber's proofs for existence and uniqueness of a minimizer of Fisher 
information to our situation. The proofs are gathered in appendix section [Appendix B| The proof 
of Theorem 14.41 makes use of the notions of absolute continuity in k dimensions and of weak 



differentiability. Both are provided in an appendix in section Appendix A 



Remark 1.1. The one-dimensional scale model, a particular case of what is covered by this paper, has 
been spelt out separately, in a small joint paper with Helmut Rieder, cf. lRuckdeschel and Rieden ( l20ld) . 

2. Setup 

2.1. Notation 

B*^ denotes the Borel (7 -algebra on R*, .M\{s^) \J^s{^)\ the set of all probability [sub- 
stochastic] measures on some a -algebra s^ , and for \i E ./#i(B), for p £ [l,°°], LpilJ-) is the 
set of all (equivalence classes of) ^|B measurable functions with E |X |'^ < 00 , resp. sup^ \X \ < 
00. I^ denotes the indicator function of the set A . I^ is the fc-dimensional unit matrix, vec(A) 
is the operator casting a matrix to a vector, stacking the columns of A over each other, vech 
the operator casting the upper half of a quadratic matrix to a vector — including the diagonal — 
and A (X) B the Kronecker product of matrices, and, for A,B e M*^^* , the symmetrized product 
A®B:=(AB + BM^)/2. 

For / G No U 00 let '^' be the set of all / times continuously differentiable functions, where — 
if necessary — we specify domain and range in the notation "^'(domain, range) . Weak conver- 
gence of measures P„ e .^i (B*) to some measure P £ ^\ (B*^) is denoted by P„ -^^ P . 
Inequalities and intervals in M* are denoted by the same symbols as in one dimension, meaning 
e.g. / < r iff /,■ < r,, for all /= 1,...,^, and [/,r] := {x£ '^\k <Xi<ri, V/= l,...,fc}. 

Let Pg G ^i(M>^) . R*^ being Polish, regular conditional distributions are available, and we 
may write Pg {dxi ,..., dxj^) as 

k-\ 

Pg{dxi , . . .,dxk) ^ Yl^e i\j+i:k{dxj\xj+i,. . . ,Xk) Pg.k{dxk) (2.1) 

;=i 
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with Pqj; the marginal of Xjt and Pe-j\j+i:k ^ regular conditional distribution of Xj , given 
Xj+i = Xj^i,. ..,Xi^~xii. In the sequel, we write yi-j for the vector {yi,...,yjy . For a mea- 
sure G on ^{M'') and a set of indices J we write Gj to denote the joint marginal of G for 
coordinates / e 7 . For y e M'^ define y^i :— yi-.t-i-j+i-.k^ and for y £ R*^^' and .y G M define the 

expression {x:y)i := {yi-.i-ux^yi-.k-iY e R'' . 

2.2. Model Definition 

For a fixed central distribution F on B*^ , we consider a statistical model ^ C ^i (B*^) 
generated by a family ^ of diffeomorphisms T : M*^ — > M*^ defined on the observation space. 
Denote the inverse of T by i = T^ ' . This family is parametrized by a p dimensional parameter 
9 , stemming from an open parameter set C M'^ , and this induces the parametric model 

^ = {Pe\Pe = MF), Oe@} (2.2) 

where Tq {F) denotes the image measure under Tq , F oig . 

Remark 2.1. In most examples, ^ will be a group, which is also the formulation used in lLehmannI ( Il983l . 
section 1.3) and Bickel et al. f 199b, Ch. 4). These authors did not intend to generalize Fisher information, 
though, and Example [33] shows that for our purposes a group structure of for the set 5f is not necessary. 

2.3. A Smooth Compactification of M 

For reasons explained in Remark 14711 we introduce the following compactification M*^ of R'^ : 

Definition 2.2. Let "^ ([0, 1] ,M), / G NUoo the space of all continuous real-valued functions 
on the domain [0,1] which are differentiable I times / arbitrarily often in (0,1), and with 
existing one-sided derivatives on (9[0, 1] . We identify this space with functions on R , using the 
isometryi 

i : [-c.;c.]'^ ^ [0, l]^ [£{{xjm = K7(e-'^' + 1)],- (2.3) 

i.e. let 

'^'(t^R):=•^'([0,l]^M)o£={^|^ = V^o£ 3v/e'r'([0,l]^R)} (2.4) 

For later purposes we also note the inverse of £ 

(c:[0,l]^-^[— ;oo], rc(yi,...,y,) = (log(V(l-y,))),=i,...,, (2.5) 

In the same manor, unbounded, continuous functions are defined and denoted by '^'(R'^,R") . 

Remark 2.3. (a) With this definition, R becomes a compact metric space. 

(b) Integrations along R* are understood as lifted onto [0,1]*^ by £,i.e. Lk fdP = Jm^u foKd[(!oP]. 

(c) The choice of £ resp. K is arbitrary to some extent, but satisfactory for our needs; in fact, we 
only have to impose £ e 'if°°(R*,R*) , lim^^_^(/((x:y)i)); =0, \im,y^oo(£({x:y)i))i = 1 , for each y e 
R*~' , £ strictly isotone in each coordinate, \£'(x)D{tq{x))\ eL2(-F), or, for uniformity in ./^i(B*), 
sup,|f(x)0(Te(x))|<-. 

(d) For every <p e '^~(R,R) , the limits lim.t^±oo (p{x) exist and lim.v^±c« j^fpix) =0 for / > 0, as 
is easily seen using the chain rule and by the fact that each summand arising in a derivative has at least a 
factor decaying as exp(— |x|) . This also implies that there are functions (p : R ^ M. which do not lie in 
■^"(RjR) but which are in '^°°{R,R) , have existing limi-^±co (p(x) , and for which limv^±oo 4i^(x) = 
for k>0: Take l/(x^ + 1) , which has no exponentially decaying derivatives. 
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(e) Consequently, for all (p e "^"(R , R) , / \b(p'\d?i is finite for any bounded, measurable function b 
and lim|j.|_^^ (p(j)'|x|* = for all fc e N, hence in particular is in L^{P) for every probability P on B . 

(f) If we allow for mass of £oP in [0, 1]*^ \ (0, 1)* — corresponding to measures in ./#j(B*) — the class 
^j, (R,R) of compactly supported functions in "^ (R,R) cannot distinguish any measures Pi y^ P2 on B 
coinciding on B* , whereas '^°°(R*,R) is measure determining on B* . 

(g) The measures Pg arising in our model from subsection l2.2l are understood as members of ./^i (B*) , 
defining Pe (A) = Pg (A n R*) for A e B*^ . 

2.4. Assumptions 

Throughout this paper, we make the following set of assumptions concerning the transforma- 
tions T, which are needed to link differentiation w.r.t. 9 to differentiation w.r.t. jc : 

(I) Pe^=Pe2 ^^ 01 = 02. 

(D) 9 1-^ ig (x) is differentiable with derivative dg ig (x) . 

(Dk) If k > 1 , xi-^ ig{x) is twice differentiable with second derivative d^-^ig (x) . 

(CI) If /t = 1 , jc 1-^ D is in '^^{R,W) and x >-^ e-l'^lDoTe(x) is in Lf (F) with 

D^D^g'\x) = dgig{x)/dM^) (2.6) 

(Ck) If it> 1, xi-^'D isin 'r'(t^R*^^''), x^ e-\'\DoXg{x) is in l''^''{F) imdx^VoXg{x) 
is in L2{F) with 

J = (7eW)/.;=i k^{{d,lgr'),.jix) (2.7) 

D = iDf{x))i=,...,^[rdgig]i,j{x) (2.8) 

,,,,,, [Lii a.,- (|deta,4e| D,j) ~ dg. [det^^iel ]; ^ ^ 
V = (VeW)M...P = Idet^.tel ^"^ ^''^^ 

Remark 2.4. Using j|-detA = (A"')j_,detAand ^^(A"'),^- (A"'),j(:(A"l);j and the chain rule of 
differentiation, one can show 

Vj = [|det5,ie|]-i[Lj^i{5,,(|det5,ie|Aj)}-^e,|det5,ie|] = 

= Y.i,Lr,m=liJl,iJr.m-JLrJm,i)djc.x^lg-^,ndg.lg.j, (2.10) 

which motivates requirement (Dk). 

In the sequel we use these abbreviations: 

Notation 2.5. The set {D = 0} is denoted by K . With e, the / -th canonical unit vector in M*^ 
and some a G R'' and y e M*^^ ' , define 

Va := y'fl, D,:=Da, Dar.= ejDa, Ki:={ejD = Q} (2.11) 

Also, for later purposes — c.f. ( I4.4l l — we introduce the functions 

D = {Df{x)),=i...,=[dglgoXg],^j{x), V = {Vg{x))j=i,„^ = VgoXg (2.12) 

Va ■■= V^a, Da:=Da, Da-i:=ejDa, ^,:={e/D = 0} (2.13) 

Finally, if F ^ A* , we write fg for foig , with / a A* density of F . 



We also introduce the following decomposition of Pg : 



Pe:-p(°'+4°\ P^''\-):=Pe{-nK). (2.14) 



3. Examples 



For the following seven popular examples we spell out the transformations Tg (x) and the 
respective parameter space and verify the assumptions from the preceding section. 

Example 3.1 (one-dim. location). Tq{x) -.— x+O , 9 £@i —R, p — k^ I . 

For each e ©i , Te(-) is a diffeomorphism; assumptions (I), (D) and (CI) are satisfied — 

dgig = —1 , dxig = 1 , D{x) = —1 , K = % — any observation x is informative for this problem. 

Example3.2(/t-dim. location, A: > 1). Te(x):=x+0, 0e©2=]R* p = k. 

For each G ©2 , Tg(-) is a diffeomorphism; assumptions (I), (D), (Dk), and (Ck) are 
satisfied — d^^^g = , dgig = —dxle = D = —\ , V = , K — — any observation x carries 
information for this problem. 

Example 3.3 (one-dim. scale). Tq (x) :— 9x, 9 e&3 — M>o , p ^ k= I . 
For each e ©3 , Te(-) is a diffeomorphism; assumptions (I), (D) and (CI) are satisfied — 
de^e ~ ~x/9^ , dxig = 1/0 , D{x) = -x/9 . Thus K — {0} , hence the point x = is not 
informative for this problem, and any x 7^ is. 

Example 3.4 (one-dim. loc. and scale). Te(x) :— 02x+ 0i , e ©4 = ©1 x ©3 , k= I , p = 2. 
For each e ©4 , Te{') is a diffeomorphism; assumptions (I), (D) and (CI) are satisfied — 
consider dgig = -{■g-;^^^y , d^lg = -^ , Z)(x) = -(1; ^^^) , K = (d — any observation x 
carries information for this problem. 

Example 3.5 (correlation, fc = 2 ; p=l). To ai,a2>0 known let 0G©5 = (— 1;1) 

Te:M.'^R\ x^ Te{x):=Jex, ye-(^'^^/'^' ^^^')x; (3.1) 

In contrast to all other examples considered here, this family does not form a group; this 
may easily be seen, as J^ does not admit a representation according to ( 13.1b . For each 
e ©5 , le{') is a diffeomorphism; assumptions (I), (D), (Dk), and (Ck) are clearly satisfied — 
a^.ie=0, y = 0, aeie = (l-02)-i(xi0(jfi-x2(J2"SO)\ 



5,ie-(l-0^)-2 



C7f' -902^ 

(l-02)/cJ2 



2^-1 ai -VU2 



D = ([0xi-ajc2]/(l-02),o)^ As /: = {xeM2|3p (^jj . x = p{au9a2Y}, P{K) < 1 
holds, as long as supp(Pe) is not contained in the line {p(l; 0O'2/o'i)^, p e M} or equivalently, 
as long as supp(F) ?; {p((l - 02)i;0)^ p e K} . 

Example 3.6 (/t -dim. scale, k>\). ig (x) := 0x , defined for ©e == {5 e R'^'^^ 1 5 = 5^ )- 0} , 
■P ~ ( 2 ) • '^^^ symmetry restriction is imposed on M*^* , allowing only for symmetric 
variations in the parameter. 
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Again, for each 9 E&6, Te(-) is a diffeomorphism; assumptions (I), (D), (Dk), and (Ck) are 
satisfied— ^_4ie =0, d_,^iej = (0"')/,i , 






v^o. 



For each symmetric matrix a e GL(A:) , we have D{x)a = 9 ^aO ^x; K ~ {0} — any obser- 
vation X 7^ carries information for this problem. 

Example 3.7 (^ -dim. location and scale, ^ > 1 ). Xe{x) -.^ 02-^^+01 , for e ©7 = M*" x ©g , 

p^k+{^\')^k{k + 3)/2. 

For each e ©7 , 1q{-) is a diffeomorphism; assumptions (I), (D) (Dk), and (Ck) are 
satisfied — d^^iQ ~0, V — Q, d^le ~ 02^' ; splitting off the indices for the parametric di- 
mensions into the location part [a single index] and the scale part [a double index], we get 

\.,le:l = -i[(02-')/,,.(02-'(-^-0l))„ + (02"')/i,(02"'(^-0l)),-,]; 

Dij = -li-i, 
Dni.j - -^[(l®02-Hx-0,)),,^,^ + (10 02-^-^-01)),,,^,,]. 

Just as in Example [321 any observation x carries information for this problem. 



4. Main Theorem 



In lHubeiJ(ll98U Definition 4. 1 and Theorem 4.2), we find a result on the Fisher information 
in the one dimensional location case which is central for the famous minimax M estimator result 
of iHuberi(il964) . The idea is to express Fisher information as a supremum, i.e. 



(Jcp'dF) 



-'■^y-MW^\ «^'"^-.} 



(4.1) 



With this definition, iHubeiJ (119811 Thm 4.2) achieves a representation of Fisher information 
without assuming densities of the central distribution: y{F) is finite iff F is a.c. with a.c. 
Lebesgue density / such that J{f'/f)^fdx < °°, which in this case is just J^{F) . 



Remark 4.1. (a) The proof in lHubeJ ( Il98ll) is credited to T. Liggett and is based on Sobolev-type 
ideas; we take these up to generalize the result to more general models and higher dimensions, 
(b) T he set ^1 in ( 14. lb plays the role of a set of test functions as in the theory generalized functions, 
compare iRudinI l ll99ll . Ch. 6). In the cited reference, Huber uses &i = '^} (R, R) , the subset of compactly 
supported functions in '^' (R, R) . In the proof later, we will need that the sets 



^D;Lj : 



{D,.jd^^(t>\(t>e&k,aeRP} 



oU) 



are dense in L2{Pa ) ■ Contrary to the one-dimensional location case, for &), = '^^(R ,M.) and general 



Daj , we did not succeed to prove this; nor can we work with ^j. = ^^cl(^ 



) , the set of continuously 



differentiable func tions with comp actly supported derivatives, as used for the one dimensional scale model 
in iRuckdeschel and Rieden 1 120 id Lem. A. 1): The crucial approximation of the constant function 1 by 
functions e ^liR'' ,R) , with |(/)| < 1 , \Da-j djc.(j)\ < 1 , and \Da;jdx,^\ -^ pointwise, fails for func- 
tions Daj growing faster than |x| for large |x| . Hence, instead we use the larger set ^°°(R , R) from 
Definition^ 



Definition 4.2. In model 3^ from ( 12.21) . assume assume (I) and (D). Let a G W\ \a\ = \ . 
k=\: Assume (CI). Let <2)i = "if (t^ , M) , £> from ( IZ6b . Then for e we define 

(j[cp'Da]dPeY [Pel ^ 

ye{F;a) -.^ sup [ ^ ^^^^^^ ' 0^(pe^i|, (4.2) 

k>\: Assume (Dk) and (Ck). Let % = '^(t*,K), D and V from ^^ and ^^. Then for 
G we define 

,/e{F-a):=mp{^ p^— ^ ^ (p G ^J. (4.3) 

Remark 4.3. (a) As Tg , resp. ig map ^1; onto itself, we may use the identification iff = (pozg to 
see that by the transformation formula 

^e(F;a):=sup|^ j-^-^ ^| O^xifeS)^}. (4.4) 

(b) In particular, the transformation formula J p{x)P0{dx) = J potQclF, entails that except for the 
correlation model of Example 1 3. 5 1 finiteness of the Fisher information for one 6 e implies finiteness for 



(k) (k) (k) 

every e : Indeed, considering Dg oTg in all these models, we see that in every case, Dg otq = £)y , 
where we write id referring to the parameter- value 6 yielding ig = id , while at the same time V = . 
So in fact we could define the Fisher information of F for one reference parameter, and its finiteness then 
entails finiteness in the whole parametric model. 

(c) In general, finiteness will however depend on the actual parameter value, which is why we define 
Fisher information at F with reference to 9 , notationally transparent as J^g{F;a) . 



With DefinitionOwe generalize lHubed (I1981L Thm. 4.2) to 



Theorem 4.4. In model S^ from ( I2.2l l assume that for some fixed G 0, (/), and, if k = 1, 
(D), and (CI), resp., if k > 1 , (Dk) and (Ck) hold. Then (the sets of) statements (i) and (ii) are 
equivalent: 

(i) ^^Va:\a\=\^e{F\a)<oo 

(ii) (a) F admits a X density f on ig {K'^) . 

(b) For every a G K'' , and i = \,...,k 

lim [fe \dtld^ie\Daj]{{x:y)i) =Q 

(c) For every a G W , and i — I,. . . ,k fg |det(9^lg|D„;,- is a.c. in k dimensions in the 
sense of Definition \A.3\ 
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(d) For every a eW and l<i<k, [M^g^ + £-|^] ^ ^^(Pe) • 
If(i) resp. (ii) holds, J'Q[F;a) = a''^^g{F)a with 

Je= I AgAl dPe , Ae - if'/f)oie dg ig + ^ff^f^'f , (4.5) 

respectively 

Ae == depg/pg with pg = fg [det^vlel- (4.6) 

Remark 4.5. (a) Theorem |4!4] also covers model 13. II however, it uses S>i = "^"(R.R) instead of 
■^c' (R, K) , hence, as '^^ (R, R) C '^°° (R, R) , finiteness of Fisher information in Ruber's definition formally 
is weaker than ours, so formally our implication (ii) =► (i) is harder, (i) ==> (ii) easier than his. 

(b) As a conseq uence of using &i = ^°°(]R,R) , we need (ii)(b), which does not show up in the corre- 
sponding Theorems H uben (198l|)(one-dim. location). 

(c) In Theorem l4.4l F may have A* singular parts on IqK . But if so. then by Corollarv lB.3l necessarilv. 
Ag = there. This means that these parts do not co ntribute any informat ion. 

(d) Closedness of a.c. functions under products dPudlev. R.Mil2002l . 7.2 Prob.4) entails that under the 
assumptions of Theorem l4.4l whenever the map x i-^ [Di,-jpg]{{x:y)i) is a.c. on some interval [c,d] where 
Daj 7^ , so is pe . 

5. Fisher information in Examples 

In this section we specify the terms Ag and J^g{F;a), as well as the quadratic form in a, 
Jg {F) = J'g , for ExamplesOtoESI In the sequel, A/(x) := -d^fjf 

Example 5.1 (one-dim. location). Ag[x):= Kf{x—Q) , yg ^ .^q = jA^^dF . The supremal 
definition of ^{F) is (|4j]i, but with ^i = 'r~(R,]R) . 

Example 5.2 (/t -dim. location, A: > 1 ). Ag{x) := A/(x- 0) , j^g^j?Q = jAfAjdF. , J^e(F;fl) ^ 
a'^J^oa. The supremal definition of y{F) is 

^o(F;fl):=sup{ ^^ /^2X I 0^'Pe^4 ^^'^^ 

1 



Example 5.3 (one-dim. scale). Ag{x) := ^[{x/e)Af{x/e) + 1] , J^e = jjj^i = -^ [{xAj - 



1)^ dF = -^{J x^AJ-dF — I). The supremal definition of ,y{F) is 



.^,,.).=.„p{(£f^™:| „'J',,^,} 



Example 5.4 (one-dim. location and scale). 



K r \ ^ ( \ l^~^^\ l^~^^\^ l^^^^\ , 1^"^ 



02 ^o-i ^^^ ~ 02 (^ JjcAJdF JixAf - \fdF 



^ ^0;i y^> - 02 ^^ /xA2 ^F j{xAf - \fc 
and J^g{F\a) = a' -^^a j Q^ ■ With a= [ai^asY , the supremal definition of J'iF) is 



^ / N ({\{ai^asX)(B'{x]F[dx)f ^^] ^1 

^,,(F;fl):=sup{^i^^ '^yj ^ >> 0^(pe^i| (5.3) 



Example 5.5 (correlation, k = 2; p=l). 



(J2(Ji\/i-0^Pg{dxi,dx2)=f C ] o ^ ,—)X^{dxudx2) 

V 1 - 9^ Oi 



or with f^fmfi 



,X2^ 



ai\/l-0^Pg.m{xuX2) ^fmi- ), J -), <y2P0;2ixi,X2) ^ f2C-^) 

' ' Vl-9^ <72 



(5.4) 



(5.5) 



and 



(l-0>e(x,,X2) = ^(?^/^^-^"^/^2^^"^/^'-"^/^2 ' - -- 1^' 



/i|2 ' VT^ 

The supremal definition of J^{F) is 



VT^02 



9, ^0= ^-edPe (5.6) 



J^9(F) :=sup| 



{l-9^)^j(p^dF 



0^(pe^2\ (5.7) 



Example 5.6 (A: -dim. scale, A:> 1). We give both vech expressions and matrix expressions, 
using symmetrized Kronecker products. We start with unsymmetrized versions. 

A^(x) = 9-'Aj^i9-'x)), Al{x)^Af{x)x'-\, 

Ae(x) = i[A"e(x)+A"g(x)^], A^e(x) - vech[Ae(x)], 

This can also be written as Ag(x) = vech[0^^ ® Ai^(x)] . In matrix notation this yields 

ye^ii9-'<S)9-')®[J{AfX'-hf^dF], 



in vector notation .J^g = jAg{Agy dF. Working with a^a'^ £ 



fkxk 



we get 



vech{ayAl{x)=Af{9-'xy9-'a9-'x-tY{9 



1-1, 



.ye{F,a) = l{Af(yy9-^ay-tr{9-^a)yF{dy) 
For symmetric a, the supremal definition of J^{F) is 



,{F,a):={ 



= i sup - 



{J'V(pixy9-'^axF{dx)y 



Jcp^dF 



O^cpe^k} (5.8) 



Example 5.7 (fe -dim. locationand scale, fe > 1). Partitioning A into a location block (1) 
and a scale block (s), we get 

Aiei.ejW = 92^Aifl^ii^{92^ix-9i)), Ai,o,i^(x) == A/(x) 
K.e.M^) - 02"iA,,oj,(02"i(x-0i)), aOoj,(x)=AKx)x^-I, 
As.oaW = {A%j^{x)+A%j^{xy)/2, A^_oA(^)=vech(As,o,i,(x)) 



^1,1,9 A 



l.s.l 



Nth 



A,s.e = %"' [ / A/ vech[02"^ ® {AfX^ - h)]' dP] 



S,S,I 



vech[02"' ® (A/X^-Ii.)]vech[02"' ® (A/X^-Ia.)]VF 



Working with a = {a];wech{a,yy , aj e W a, = aj e M*^^*^ , we get 

vech(fl)^Ajj(x)=a/02"'A/(e2"'(x-ei)) + A/(e2"'(x-ei))^e2"'as02"'(x-ei)-tr(e2"^a^), 

The supremal definition of J^{F) is 

ye{F,a):^snp[ ^-^ ^' ' ^^^^^ '^ ^ '^ ^ (p E %} (5.9) 

To keep the order of the examples as in section|3] we place a remark here, concerning Example l3.5l 

Remark 5.8. The fact that we are dealing with a one dimensional parameter seems to indicate that it 
should be possible to treat the problem using only one dimensional densities. Factorizations ( 15. 5t and l l5.6t 
seem to point into the same direction, as they seem to suggest that working with 

al^/^^Pe{dxudx2) = Ai2{^^^^^==^,-)Hdxi)F2{02'dx2) (5.10) 

Vl — 6 "2 

instead of l l5.4t . we could allow for any second marginal F2 — possibly even F2 ± A — and just focus on 
the conditional densities for each fixed X2 section. 

Theorem l4.4l however, excludes that possibility for finite Fisher information. To be fair, one has to admit 
that anyway, not every F with Pg = TqF could be allowed for J5.10b . but only exactly those achieving this 
representation. But even then it is of rather marginal interest, as may be seen in the following example: 
Consider Ki ~ ,yK(0, 1) , ¥2^ ±1 with P(Y2 = 1) = P{Y2 = -1) = 1/2, Yi , Y2 independent and F := 
^{ri,}!)- Then for any 6 e&5 , X = ZeiY) = {{I- e^)^-Yi+eY2,Y2), and recovering 6 from observa- 
tions of X amounts to estimating E[Xi \X2 = X2] for X2 = ±1 — a task falling into the usual Of («^ 2 ) -type 
of statistical decision problems; if on the other hand, we take F = J^{Yx,{l — a?')'-Yi + aY2,) , for any 
< |a| < 1, then, for 0/-(2-a^)"2, J^{X) is concentrated on two lines JC2 = a/ + /3Xi , (=1,2 
with j8 = (1 — 0:^)2/ [(1 — 02^2 -(_0(i_o;2^2]. But as we assume F to be known, knowledge of jS is just 
as good as knowledge of B . Having fixed an observation X'"' , j3 may be recovered exactly, as soon as we 
have found two further observations X^ and X'^' both lying on the same line as X'"' , which will happen 
in finite time almost surely. Thus here a single observation must have infinite information on 6 — which is 
just according to our theorem. 
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6. Consequences for the LAN Approach 

In general finiteness of Fisher information does not imply L2 -differentiability without addi- 
tional assumptions like, e.g. that for A* almost all x and for a ll p € R^ the map s n- /?Q+sp(x) 
is a.c. and the Fisher information J'q is continuous in 9 — c.f. lLe CamI ([1986, 17.3 Prop.4). 



All examples from section [3] — except for the correlation example, Example 13.51 — ^provide more 
structure, though. They may all be summarized in the (multivariate) location scale model of Ex- 
ample |321 First of all, due to the invariance/dilation relations of Lebesgue measure w.r.t. affine 
transformations, we may limit attention to the refere nce parameter {0,\). Even more though. 



we have the follo wing generalization of Lemmas by Haiekl (Il972h (one-dimensional location) 



and lSwensenI ( 1198 0, Ch.2, Sec. 3) to the multivariate location case 



Proposition 6.1. Assume that in the multivariate location and scale t^iodel UTTl Fisher informa- 
tion as defined in (15.8b is finite for some parameter value. Then the model is L2 -differentiable 
for any parameter value. 



Hence Theorem l4.4l gives a sufficient condition for these models to be L2 -differentiable and 
as a consequence to be LAN. 

On the other hand, L2 -differentiability requires finiteness of J^g , so that in the multivariate 
location and scale case, for all central distributions F , the model with central distribution F is 
L2 differentiable iff sup„ J^0(F;a) < °o. 

In the i.i.d. setup Le Cam ( 1986, 17.3 Prop. 2) even show that L2 -differentiability is both neces- 
sary and sufficient to get an LAN expansion of the likelihoods in form 

logdP"^/dP"e ^^f^h'Ae{xi)-h'^eh + opr{n') (6.1) 

V " ,=1 ^ 

with some Ag E L2{Pg) and ^ J^g = E[AeAg] ^ 00, so again in the setup of the (multivari- 
ate) location scale model of Example |32] finiteness of Fisher information is both necessary and 
sufficient to such an LAN expansion. Altogether we have 



Proposition 6.2. In models \3J]\3.2\ 1X31 13.41 1X61 13.71 the following statements are equivalent 

(i) The respective Fisher information from (14.31 1 is finite for any parameter value, 

(ii) Conditions (ii) of Theorem \4.4\ hold for any parameter value. 

(Hi) The model is L2 -differentiable for any parameter value, 

(iv) The model admits the LAN property ( 16. Il l for any parameter value. 

Remark 6.3. The proof uses the translation invariance and the transformation property under dilations of 
k -dimensional Lebesgue measure, so there is not much room for extensions beyond group models induced 
by subgroups of the general affine group. 

7. Minimization of the Fisher information 

Representations ( 14.2b resp. ( 14.31 ) for Fisher information allow for minimization, resp. to max- 
imization of the trace or maxev of J^g w.r.t. the central distribution Pg 01 F . In this paper, we 
settle the questions of (strict) convexity and lower continuity just as in Huber (1981) , but replace 
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vague topology used in lHuben (11981 ') by weak topology. This is done in order to establish exis- 



tence and uniqueness of a minimizing F^'^' in some suitable neighborhood of the (ideal) model. 
To this end define for aeW , (p E ^k, {{(po^eWtiiF) ¥" 

and 

>e(F):=supJ^e(F;fl), a e R'', |fl| = 1 (7.2) 

7.1. Weak Lower Semicontinuity and Convexity 

To show weak lower semicontinuity and convexity, we use that f or fixed W_G_^k ^ <P 7^ [f e] , 



F 1-^ J^g{F;a;(p) is weak continuous (by definition) and convex (bv lHubeiJ(ll98U Lemma 4.4)). 
Essentially we may then use that the supremum of continuous functions is lower semicontinous 
and the supremum of convex functions remains convex; but some subtle additional arguments 
are needed as the set of (p's over which we are maximizing may vary from F to F ; these can 



be found in 'Ruckdeschel and Riederl (120 1(\ Proof to Prop. 2.1). Altogether we have shown 



Proposition 7.1. For each a S W , the mapping F 1— > J^q {F;a) is weakly lower- semicontinuous 
and convex in F £ ^\ (B ) . The same goes for F i— )• ,^q [F) . 

Remark 7.2. Using R from Definition l2.2l we work with a compact definition space right away, which 
moreover is endowed with a separable metric, so any subset of probability measures on B*^ is tight, hence 
by Prokhorov's theorem weakly relatively sequentially compact. 

Corollary 7.3. In any weakly closed set ^ C ./#i (B ) , both J^g and J^q-^u — for fixed a — attain 
their minimum in some Fq £ ^ . 

7.2. Strict Convexity — Uniqueness of a Minimizer 



We essentially take over the assumptions of H uberl ( 119811) : we fix e and consider varia- 
tions in F of the following form: For Fi G ^{W^) ; = 0, 1 consider 

Fr.^{l-t)FQOie+tFyoie (7.3) 

We distinguish cases (a) and ( J^), i.e., of a given one-dimensional projection a 7^ 0, and the 
corresponding maximal eigenvalue, respectively. 

Proposition 7.4. Under assumptions 

(a) The set ^ of admitted central distributions F is convex. 

(b) There is a Fq € l3^ minimizing 

(a) J^g{F;a) along .^ and yg{Fo;a) <°o. 
(J) JeiF) along .^ and Jq{Fq) < °o. 

(c) The set where the Lebesgue-density /o of Fq is strictly positive is convex and contains the 
support of every Ft derived from some Fi £ .^ . 

(d) (a) X''{{x\a^dgigix)=0})=0 

(J) A*({x|3a: |a| = l s.t. a''deie{x) =Q})=Q 
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the map F i-> .yg(F;a) (case (a)) resp. F i— >■ ^q{F) (case ( .y )) is strictly convex, hence there 
is a unique minimizer of Fq . 

Remark 7.5. Assumption (d) holds for the k dimensional location scale model of Example 13.71 For 
symmetric a, and the scale part 6, it holds that aj^g^lg = a^d^^x. But for any Oj with la^l = 1, 
dimkerfljSj^' < k— 1 , hence A (kerajS^j"') = 0. 

7.3. Existence of a Maximizer of Xi J^g (F) 

Proposition 7.6. Let ^ be a weakly closed subset of ^\ (B^) . Assume that for all =^ aiE M'^ , 
min/?g,^ J^g(F;fl) > 0. Then the function F i— ^ tr J^g" (F) is weakly upper-semicontinuous on 
^ , and consequentially, attains its maximum along ^ in some Fq ^ ^ . 

Appendix A. Functional Analysis and Generalized Differentiability 

Appendix A.l. Dense Functions 

Proposition A.l. Let p. be a a -finite measure on B . Then the set "^"(8,8) is dense in any Lp(p), 
p (z [1 , °°) • In particular, there is a cq G (0, °o) s.t.for any a <b (zM. and any d > there is a (p = (Pa.b.S £ 
■^"(RJO,!]), wir/i (p = on [a-8;b^8f , (p = \ on [a + 8\b-8] and \(p\ <co/5. 

Proof : Denseness is a consequence of Lusin's Theorem, compare iRudinI ( 1 19741 . Thm. 3.14). To achieve 
the universal bound cq , we may use functions f{t) = ( /g f{s) ds) / ( Jo /(«) du) , for f{t) = e^ '' ' , f{t) = 

mf(i-t). ' □ 

Appendix A.2. Absolute Continuity 

We recall the following characterization of absolute continuity [notation a.c] of functions F : R ^^ R 
that can be found in Rudin (1974, Ch. 8). 

Theorem A.2. For F : [a, fc] — > R, a < & G R the following statements L to 3. are equivalent 

1. F is a.c. on [a,b] 

2. (a) F'{x) exists X{dx) a.e. on [a.b] and F'eLi(A| ). 
(b) F{x)-F{a) = J^F'{s))i{ds) for ad x e [a,b] . 

3. There is some ttGLi(A| ) s.t.forxE [a,b], F{x) has the representation 



F(x) =F{a)+ [ u{s)?i{ds) 
J a 



We also recall that a.c. functions, are closed under products JDudlev. R.Mil2002L 7.2 Prob.4). In particular, 
integration by parts is available. In this paper, we call a function F : R — > R a.c. if the equivalent statements 
1. to 3. from Theorem lA.2l are valid for each compact interval [a,b] C R. 



Appendix A. 3. Absolute Continuity in Higher Dimensions 

A little care has to be taken a bout null s ets wh en transferring absolute continuity to higher dimensions. 
The next definition is drawn from lSimaden ( 1200 ih . 

Definition A.3. A function f : (R*,B*) -^ (R,B) is caHed absolutely continuous (in k dimensions), if for 
every i= l,...,k, there is a set Nt gB*"' with A'^"'(iV,) =0 s.t.for y e Nf , the function /;_,. : (R,B) -^ 
(R,B) , X i-> fij{x) = f{{x:y)i) is a.c. in the usual sense. 



In the proof of (ii) => (i) in Theorem l4.4l we need the following lemma: 

13 



Lemma A.4. Let / : R ^ R a.c. in k dimensions. Then for each i = 1 . .. ,k 

A*({/ = 0},K,/7^0})=0 (A.l) 

Proof : Let g(z) = I{f=o|n{5v fjtO}{^) ■ Then ^ > and Tonelli applies, so the section-wise defined 
function hy{x) := g{{x:y)i) is measurable for each v £ B*^' and defining the possibly infinite integrals 
H{y) := fhy{x)X{dx) we get 1*^({/ = 0},{^'v,/ 7^ 0}) = JgdA* = ///(.v)l*^-i(dy) . But for each >■ the 
instances x where hy{x) = 0, h'{x) 7^ are separated by open one-dim. sets where /ly ^ 0, as hy{x) = 0, 
h'y{x) 7^ implies that for some < |x' — x| < e , |/(x')| > l^ —x\ \h'y{x)\/2 > 0. Hence at most there can 
be a countable number of such x, and thus H{y) =0 for each y . D 

Appendix A.4. Weak Differentiability 

For proving absolute continuity in Theorem 14.41 we have worked with the notion of weak differentia - 
bility; to this end we compile the following definitions and propositions again drawn from lSimaden <200lh . 
which we have specialized to differentiation of order one. 

Definition A.5. Let u e Li,,„(A ), \ <i<k. Then v,- £ L\^„^{X ) is called weak derivative of u (with 
respect to x; j, denoted by dx- u , if 

udx^cpdX'' = - [ viipdX'' V(pe'ir~(R^R) (A.2) 



Remark A.6. (a) The weak derivative is unique, as for the difference d = v,- — v. of two potential 
candidates, we have J^t d<pdX'' = for all (p e '^^(M/'M) , so by ProDosition lA.il d must be [A*] . 

(b) Weak derivatives belonging to L2(A*) give rise to the space #2.1 = ^;i('^'^) of all functions / : 
R —7>R with weak derivatives in L2(''-) of order one endowed with the norm ||/||^ ■=£/=; ll^v,./^!!? nt, 
which is called Sobolev space of order 2 and 1 for which there is a rich theory. 

(c) The following two propo sitions — under the additional requirement that Vf resp. V/ be in L2{X ) , 
however — may also be found in lMaz'val jl985l . Thm.'s 1 and 2). 

Proposition A.7. Let f ^ L\i„{X^) with a weak gradient V f . Then there is some f , a.c. in k dimensions 
with usual gradient Vf , such that — up to a X -null set — / = / and V/ = V/. 

Proof : Let again fi„, = [—m,m\ and consider Xm G ^/t with < ;f m < 1 , (pm = on fiJJ, , j ,;);„,= 1 on 
fl„, , and let /„, = fXm ■ Then /„, e Lj (/l*^) and we have for any (j) E &k 



-Jfmd,,,(l>dX'' = - l'xm(l>dxJdX'' = j fdxXXm^)dX'' = j f{^dx,Xm+Xmdx:^) 



dX'' 



so that /,„ is weakly differentiable and dy.f,„ = Xm^xJ + fdx,Xm £ Li{X^) . By Fubini we obtain some 
Nm,i e B*"' with A*^"' (W,,,,,) = such that v„, : R*^"' ^ R defined as 

Vm{y)--= / \dx,fm({t:y)i)\X{dt) foiyeN^„i and else 

is finite for vGR , lies in Li(X ) and /gi-i v,„dX = ||$i,/m||L,(A')- Thus we may define to xsR 

Fm{{^-y)i)-= £ dxJm{{t.y)i)X{dt) foi yeN^„j and else (A.3) 

Apparently, F„i e Li,,„(A ) and for y e N^^^ i^ ^'^ ^m((x:.y),) is a.c. Let (p ^ S^i^; then Fubini yields 

/ := f6F,„dX''^l f dx,f,„{{t-y)i) f it.>t}<l>iix-y)i)Hdx)X{dt)X''-\dy) 
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So far we do not know if the inner integral on the RHS is in Li (A ) , so another localization argument is 
needed. To this end let \ff E 2>k, V — 1 of ^m+l > V' — on fij^ , 2 > then as fm,dxifm = on fij^^ , j , we 
have /,„ ^ fm\if , dxjm = V^x^fm , and /m4, V^ = . For with u = (t: >■),■ define the function 

(P(m) := V^Hy !{,.>,} i|)((x:>');)A(dx), 

which clearly lies in i^j. . Fubini and the definition of weak differentiability entail 

/ = / [ d,Jm{(t:y)i)(p{{t:y)i)X{dt)X''-^{dy)= [ wdx,fmca'' = - [ f„dx^(pd?i'' 

But, dx^(p{u) = dx^\lf{u)jTi^l^x>t}{x)(l>{{x-y)i)^{dx)-\lf{u)<j>{u), as fmdx.\lf=0, fm = fmV we get 

(|)F,„dl* = [ f„Y(f>dX'' = (MdX\ (A.4) 



Because (^ was arbitrary in ®j. , F^ = /„, [A*] , and by letting m^ '^ we may extend this to K* . Fubini 
then provides a A -null set 5; s.t. for y £ S^ the projection set Sf := {j: £ R : {x:y)i G S,} has A - 
measure 0. Let W,- := IJm ^m,i ; then A (A',) = 0, and for y G A'f the functions xn>Fm((x:)'),) are a.c, 

hence continuous in particular. For y e {NiUSiY , xG (5,- )'^ and fce N, even F,„((x :>'),) =F„,+j:((x:)'),) , 
and hence by continuity, for all y e (NiUSiY , F„,((x:j),) = F„,+i((x :>■),■) for all x. Hence, writing again 
u = {t:y)i , this gives a unique function ft e Li.i„(A ) defined as 

K ^ ._ / hm,„ F„(m) for m e R*, >■ e (W; U 5,-)'^ 

■^'^"^ - \ else ^^•^' 

s.t. that fi is a.c. w.r.t. x,- in the sense that there is A -null set Ni s.t. for y e Nf the function x i-> /;((x: 
y),) is a.c. By construction, A*({/; 7^/} U {5/7^ (5^}) =0. Applying this argument for each i=l,...,k, 
we see that there is a function / which is a.c. in k dimensions, s.t. A ({/ ^ /} U {V/ ^ V/}) = . D 

Proposition A.8. Let f ^L\^^^{X ) be a.c. in k dimensions. If its classical partial derivatives dxjf, are 
extended by on those lines where absolute continuity fails, and the so extended gradient belongs to 
\ \aS^ I • ^^^^ there is a weak gradient of f and the extended gradient can be taken as a version of the 
weak gradient. 

Proof: As / is a.c. in fe dimensions there exist A'; gB*^' such that for y e A^^!^ the functions xn>/((x:y),) 
are a.c. Let (j) ^ &ii_ and y e Nf . Then x i-> (j){{x:y)i) e i^i and thus by integration by parts, for y e Nf , 
we have 

/ f{{x:y)i)dx,(^{{x:y)i)X{dx) = - [ 0((x:>-),-)5x,/((x:y),) A(dx) (A.6) 

JR Jr 

Obviously, extending fdx,<j> , <j> dxj by on y e A'; , these two functions belong to L\ (A ) . Fubini thus 
yields a set A',- e B*-' , X''-\Ni) = 0, s.t. for y GiVf , xh^ [/^>j,(^]((x:y)i) , xi-^ [^ 5^,/]((x:y);) belong to 
Li (A) . Hence by Fubini J^k fdxj(j)dX =— f^k(j)dxjfdX . As dxif ^ L\,\oc(^) by definition of absolute 
continuity in k dimensions, this possibly extended dxj is a weak derivative of / . D 

Remark A.9. Having this "almost" coinciding of weak differentiability and absolute continuity in k di- 
mensions in mind, we drop the notational difference of weak and classical derivatives. 
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Appendix B. Proofs 

Appendix B.l. Preparations 

Before proving Theorem l4.4l some preparations are needed. 

We want to parallel the proof given in H uber ( 198 1) credited to T. Liggett: The idea is to define for 
given a e R^ linear functionals f„;; on the dense subset 'i#'~(R*,R) of L2(-Pe) as 



ta;i : 'if~(K^E) ^ R, f„.j{(p) := J D,.jd,,(pdPg. 



(B.l) 



Remark B.l. As also true for the one-dimensional location model treated in lHuben (198l|) and in the one- 
dimensional scale model in lRuckdeschel and Riedeil dZOlOh . it is not clear a priori whether this is a sound 
definition, i.e., whether faj respect equivalence classes of functions in Lj^Pq) : 

As by (Dk) resp. (Dl), Da-i is continuously differentiable, it is bounded on compacts, hence D„-id_^.(p e 
Li (Pq) for any (p e "^"(R ,R) ; but even then, it is still not clear whether JB.lt makes a definition: Take 
x(") e R'' so that xf D„;,(x(")) 7^ and Pg Dirac measure for {x*"'} . 

Then obviously, (po (x) = (xC^) ) ^ (x -xC') ) = Pe{dx)-a.e., but it also holds, dj,. (po {x)Da-i (x) = xf £)«;,■ (x^) ) 
Pg{dx) -a.e., with the consequence that, although (pQ = [Pg] , fa-^i((po) 7^ fi,j{0) = . Of course, (po must 
be modified away from x' ' to some (p s.t. (p belongs to 'i#'°°(R ,R) . Luckily enough, this case cannot 
occur under condition (i) of Theorem l4.4l as then 



(p = 0[Pe] =^ f„.i{(p) = Jd,^(pD„,idPe=0 (B.2) 

which may be proved just along the lines of the first paragraph of iRuckdeschel and Rieden ( 12010 '. Proof 
to Thm. 2.2). Due to linearity of differentiation, evaluated member-wise in an Pg -equivalence class, this 
shows that f respects Pg -equivalence classes. 

Next we need a lemma showing denseness of certain sets in suitable L2 's. To do so we define for 
i,j= I,. .. ,k 

&D-ij ■■= {Daj A-> I (^ e i^^, a e R''} (B.3) 

and recalling that Kj := {e^:D = 0} we introduce the decompositions corresponding to ( IB.4t 

Pg := P\l^ +P^^\ pI/\-) := Pg{-nKj). (B.4) 

Lemma B.2. !^D-iJ '■s dense in L2{Q) for any CT -finite measure on B fl A'^ In particular it is measure- 
determining for B nA'y . 

Proof : Approximating / e LiiQ) in ^2(6) by /„ := fln„ with Q.„ = [—n,n]^ , we may restrict ourselves 
to f^Ai for A' sufficiently large. Thus we have to show that for each interval J C K'j and each e > , there 

is a 9 in Sio-ij wi* Wh-MliiPe) ^^■ 

To this end fix a e R'' ; as Daj is continuous, the set K'j is open, hence is the countable union of fc 

dimensional intervals J,,, := (/('«);r('")) , m e N, with /W < r("') and \Da-j\ > on J,„ . 
So it suffices to show that any indicator to an interval I = [l;r], / C J„, with endpoints s.t. Qidl) = 
may be approximated in ^2(2) by functions in &dj.j ■ But, for given e > 0, Proposition I A. 1 1 provides 
an element (po e "^"(R ,R) such that ||(po — 1/ ||q < £ . By construction its anti-derivative v/()((x: y),) : = 
JI{p,„)<z<x)(pQ{{z:y)i))/Da-j{{z-y)i))X{dz) lies in ^~(R*,R) hence (po in ^d-jj ■ In particular we 
may approximate the Q measure for fc -dimensional intervals disjoint to Kj , which determines Q . D 
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Appendix B. 2. Proof of the Main Theorem 

(ii) =► (i) of Theoreml4l4l 

In order to avoid specializing tlie case k= \ , define V :=Q there. 

Fix a e R'' , |a| = 1 . / being a density, A({/ = 0, d^.f ^ 0}) = for each / and we may write 

{d,,(p)DajdPg = [ {d,,<p)DajdPg = [ f(5,,,(p)Z)„;,]oTe rfF^"li"Y [(5,,,(p)Z)„;,]o-re/dl* = 

-L 



{d„<p)Dajfe |det5,le|rfl* W - / (pfgd,,{D„,i\detd,l0\) + (pD„.j\detd,l0\d,JedV 
5,-ddet,?,le|0„;,) D„.id,Je i^(**) f d,.{\deld^le\Da-i) , D„.jd^,, 



L'PP' [ |det5,te| + ^^^ '^ = - ,/ '^^^ t 



det5,.ie| fe ' .r'"' |det5,ie| /e 



rfA*^ 



In equation (*) we use that by (ii)(c), on K"^ , f is a.c. A a.e. so that integration by parts integration by 
parts is available without having to care about border values due to (ii)(b). By (ii)(d) the resulting integrand 
on the RHS of (*) is in Li^Pq) ■ In equation (**) , we used the fact that in each expression considered 
above, there appears at least one Da-i or a derivative dxjDa-i ; Lemma IA!41 applies and hence 

A*({fl«;i = 0},{5,,0,;,-7^0})=0 

Representations i l4.5t and i l4.6t : 

Writing out dxfe = {dxie){dxf)°le > ws see that 

Dl..dxfe=a'{deie)J{dxie){dxf)oie=a'{deie){dxf)oig=a'defe (B.5) 

Thus we get 

Lidx,lDa;i\<ie-tdxle\] DlAfe ^, Dcf^i' a^(?e|det3.,-t9| DlAfe ^ a''de\Ae\.dxlg\ a'^defe _^tj^ 
\deidxie\ fe " |deto>^ie| /e |det5,ie| /e " ^' 

so Ae e Lf^iPe) by (ii)(d) and hence 

(jv((>'Da + (pVadPe\ ^fjcp^^^^^dPe) < j[a'AefdPe jip^dPe, 

which shows that .y{F\a) < f{a^Ag)dPg . The upper bound may be approximated by a sequence 
(Pn e !^k tending to a''Ag in L2{Pe) entailing ( I4.5t and ( I4.6t . 

(i) => (ii) in Theorem l44l 

We will give a proof largely paralleling |Hubea ( 119811) , although we may skip some of his arguments. 
Well defined operators and Riesz-Frechet: We consider the linear functionals fg-j from JB.lt . defined on 
the dense subset '^°° (R , R) of L2 {Pq ) , which are well defined due to ( IB.2I (. In particular fg-j are bounded 
linear operators with squared operator norms bounded by ^g{F;a) , hence can be extended by continuity 
to continuous linear operators T„-i : L2{Pg) — > R with the same operator norms. Thus Riesz Frechet applies, 
yielding generating elements ga-j £ ^2(^9) s.t. 

Ta:i{<P) = - I Sa:i(pdPe V(peL2(/'e) and||g„;;||2^(p^) = ||f„'|| (B.6) 

We conclude inductively for / = 1 , . . . , fe . 



/=1 



Using Fubini: We have for 9 e 



T„,i{<p) = J D„,idx,(pdPg (B.7) 
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On the other hand by the fundamental theorem of calculus, 

Ta;]i<P) = - (Pg«;\dPe= I / I{x, >v,| ^.v, 'P(^l ,)'2:*;)'^ ('^■^l ) gfl;l (.v) ^9 (^y)- 

Now for each compact A, the integrand hi{xi,y\A) = iA{^l)^{xi>y,} 8a;l{y) ^^ ^^ Li{X{dxi) ^Pg{dy)) . 
Fubini for Markov kernels thus yields a A ® Pe;2:k -null set A'l such that for (xi , j2:i:) ^ f^i , ^i £ ^4 , the 
function hi{yi) :=hi{xi,y;A) belongs to J^i (^e;i|2:j(:('^yib2:A:)) • We now define for Da\ ^0 the function 

[a-l-A] 
P\\2:k as 

r„ Ja-A-Ahf^ „ V j Ihl{yi)Pe;l\2:k{dyi\y2:k) for {xi,y2:k)<^N{,XieA 

\L>a-\P^2:k \\^i^y2:k) — ■y q gj^g (^■^) 

where obviously the dependence in A is such that for another compact A' D A , pi^A-t = Pm-k ^a{^\) ■ 
Hence for arbitrary xi take A such that xi e A and eliminate the index A in the superscript where it is 
clear from the context. We also note that by Cauchy-Schwarz, for Pe;2:k{dy2:k) -a-S- y2:k ^ 



\[Da;lp[l2l]i^l,y2:k)\^ < j ga;x{y? Ps;l\2:k{dyx\y2:k) 



< °o (B.9) 



Getting rid of the dependence on a : To understand, how Pu2.l is related to p^'-^t for Oy^a'eW, 

we consider again dB.lb . l IB.Sb : Both sides of the latter must be of form W^a for some MP valued W 
independent of a ; in particular 

ga,i=wla (B.IO) 

for some wj e L2{Pg) . Hence 

^=1^2^ on{D^:}^0}n{D%^0} (B.ll) 

and, as we only need p orthogonal values of a to specify wi , we arrive at a maximally extended Piil-k 
defined on &"[ = {D.j ^ 0} . Also, for Pe;2:k{dy2:k) -a-S- >'2:A: ^ 

\[Da:lp[\l.k]i^Uy2:k)\^ < H^jH{y)\^Pe;l\2:kidyi\y2:k) (B.12) 

p. 12 is a density: Plugging in this maximal definition, we get for (p ^ Si^, using A = supp(9) , 

Ta-AV) = J D,,id,,(pdPg^J[D,rjd,,(pp[\lj{xi,y2-.k)X{dxi)Pg,2-.k{dy2-.k)- (B.13) 



where integrability of the integrands follows from Remark [2. 3 f e) and (Cl)/(Ck), and for the right one 
from ( IB.12t . which also entails that Pe;2:k{dy2:k) -^-^-^ xi n> Da-\p.,l, is the A -density of a CT -finite 
signed measure. Hence, we have shown that PQ{dxi,dy2±) and Pi\2-k{^\^y2:k)^{dx\)Pe-^2:k{dy2:k) when 
restricted to K'^ define the same functional on the set &D;\.i . which is measure-determining for B nA'[ 
due to LemmalR2l 



Therefore, the restriction to compacts A can be dropped entirely, and we may work with A = R . Using 
Fubini once again, we see that on A'j^ , there is a Pe;2:k{dy2:k) -null set N\ , s.t. for fixed y2:k £ ^i > the 
function p\,2.i,{^\^y2:k) is a Lebesgue density of the regular conditional distribution P\..^,{dx\\y2:k), 
hence non negative and in L\ (A) . 

Replacing A"i by A" : Similarly as for the dependence on a , we may extend the definition of p\ (x\ ,^2:*) 
to the set K'^ : Any (9j;(p for (f> ^ S>i( may also be interpreted as d^ ^ for some ^ e ^j. . More specifically, 
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9 = (p°7lij with Kjj the permutation of coordinates i and j . Thus introducing for I < i,l <k operators 
Ta;ij : ®;t — > R, ip (->■ J djc.(pDajdPg , we amply see their boundedness in operator norm by \\Ta-j\\ , hence 
extending them to L2 (Pe ) as before, giving operators Ta-^i,j , we also get generating elements g„-jj e L2 {Pq ) 
by Riesz-Frechet and eventually, using denseness of &D\i.i in ^2 {Pg ) > we obtain correspondingly defined 
pHi, for j = l,...k. Now pJ,l-k being Lebesgue densities of P\,-..^,{dx\\y2:k), there is a fe.2:A--null 
set — for simplicity again N\ — such that for y ENl , for each pair }\ 7^ 72 , 

pi|2L((^.^)i) = /'i|2L((^'^)i) [^(^'■^)] °"^>,n^>„ (B.14) 

so we may indeed speak of a maximally extended Pxn-t defined on K'^ . 



i-l^i 



|2 
Assume we have already shown that there is a Pqii; -null set A',_i such that for y/.^. e N^_ 



Pa admits some conditional density, 



Pl:i-l\i:ki^l:i-Uyi:k)^' ' (-^-^l:/-! ) = -Pe. i.,_i|,:i.(^^l:i-l b/:<:)- (B.15) 

Arguing just as for ; = 1 , we get 

TcrA<P) = / Da-id^,(pdPe = / 1{D„,^0] Daj d_x,(pdPe = - (pgajdPe- 

Thus using the induction assumptions we proceed as before, i.e.; define A,(x,-,y;A) for some compact A, 
a the section-wise defined function A,(>';) := hi{xi,y;A) , the function Pi. V ,1.1, which extends to R giving 
Pj".'l .., , and where the dependence on a may be dropped, giving Pi r ,1.1. • As this defines the same 

functional on the set &D;i,i as the Markov kernel Piju+i;ii, by Lemma IB. 2 1 Pi i, i.i is a conditional 
density defined on Kf . Using the coordinate permutation argument to drop the dependence on A", , we 
obtain Pi-i\i+i-k defined on K"^ . 

Hence the induction is complete, and we have shown that Pg admits a A density pg which we 
denote by 

PeW ■= Pei^i-.k) '■= Pe-\:k{^\:k) ■= Pi-.ki^i-.k)- 

Showing ga-i = 0[Pq ] : Writing l IB.St and its analogue for general ; for any fixed a e R'' with Pl and 
Pg , we see that by Fubini, for y outside a Pe--, -null set, 

lDa;iPl:i\i+l:kK{x-y)i) ■= f_Ja-Mz-y)i) Pe{(z.y)i)X{dz) + ya;i{{^.y)i) (B.16) 



with 



Ya;i({^:y)i) ■■= l\.A(z:y)i)P^^}_^{dz\y) 



(B.17) 



We next show that for fixed a e R'"" and fixed y outside a Pg-^-j -null set, the value of Ya-J = : 

To this end we show that for any Borel subset S of AT or equivalently for any proper or improper interval 

l = [l,r]cK, 

lsa-Aiz--y)i)P^^ljdz\y)=0 

Of course, Jj dPg.. _.(dz|j) = 0, [Pg--i] ■ Consider ((>„ e iFj. with < ^„ < 1 , (j>n = l on / and (j),, = for 

{x e R*^ I dist(x, /) > 1 /«} , and | d^^ (j),, \ <6n. The last bound is chosen according to the bound | (p | < 2c() 5 
from Proposition lA.il Then <j>„ -^ 1/ pointwise, hence by dominated convergence and Cauchy-Schwartz 
we get 

" (.2 ,n(0) _„/„0^ I /■„ . ,p(0) 



/ 



^^dP^!^_. = oK), I J ga,4ndP^;!^_.\ = 0{n- 
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On the other hand let C:=max{|5v,Dfl;i| |^G supp(0i)} . Then as Daj = on /, and because supp(5v,0«) C 
{jeR*|0<dist(x,/) < 1/n}, we have for xG supp(0i) that |£'a;,(x)| < C|x| <C/n, and hence 

Thus, due to the shrinking of {supp((^„) n/'^} , for x ^ / , [dx^(l>nDa-;^ (x) ^ . Furthermore [5t,(|)„£)a;,] (x) = 
on /, as Da-i{x) = on / C A^ by definition, hence also [(?v, •^nDo;,] — > pointwise and with dominated 
convergence 



/ 



Da;idx;<l>ndPe.i<i = o(;i°). 



;'1 
So we have 



which implies ja-^i = and hence, integrating by Pq._j over any A e B*^' , ga-^i = [Pi ] . Similarly, we 
obtain 



tej=0 [Pf\ 


(B.18) 


This also entails that 




1 dx,(p D,,jdPe = J 5,,(P Da,jpg dl^ = -J (pgaJjdPe ^ - 


- ffgaj.jPedX''. (B.19) 


Application of Proposition IA.71 From (|B.19b. we get f dr(pD,rjpei dX'' = 


-I'PgaJ.jPedl'' for all (pe 



&k ■ By Definition lA. 5 1 gajjPe thus is the weak derivative of Dajpe w.r.t. x; . By Proposition ! A. 7l there 

is a modification of Dgjpg on a A* -null set such that this modification — for simplicity again denoted by 

Da-jPe — is a.c. in k dimensions. Hence, for X^ a.e. x , Dajpg is differentiable w.r.t. x; in the classical 

sense with a derivative coinciding with gaj.jPe up to a X -null set. 

As Da j is continuously differentiable, pQ is differentiable on K'^- for A* a.e. x , and using again all the 

'•' J 

different Daj , j = 1 , . . . , A; , the same is even true on K'^ . 

Proof of (ii)(a)-(d): Defining for e 

/'^):=(pe/|det5^ie|)o-re, (B.20) 

and recalling that Pq — Foig , we see that by the Lebesgue transformation formula, p ' must be a density 
of F , hence the index 6 may be dropped, and (ii)(a) follows. Once again by the transformation formula, 

pg = \detdxie\ifoi0) = |det5,ie|/e. (B.21) 

and thus (ii)(c) holds. For (ii)(b) we consider K defined analogously as for /: = 1 as inverse to £ from 
([23}: We lift ^^ to [0, 1]*^ , giving 

([ yKqedX)^ <.MF)[ y-d[£{P0)] Vv^e<^"([0,l]*,K), 

where gg = pqok, and we have to show that [KqQ](u) = for k G d{[0, 1] ) . We only show mi = 1 , all 
other cases follow similarly. Let i//„ e i^^., i//„ — >I|i|x|o ll'-' ™ ^2(^(^9)) and pointwise. Thenby Fubini 
and by integration by parts 

[\dx,{iir„)Kqg]{{x:y)i)X{dx)X''-'{dy) = 
[O.I]'-' Jo 



,11'-' L 



,1 '•' 



Kqe\o- i goKXIf„[i(Pg)\,\2:k{dx\y)\ We)\2:k{dy) 
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But /[o,i]t Vnd{e{Pe)] -^ entails by Fubini, /J Vn dViPe)]\\2:k -> [e{Pe)hk(dy) a.e. and by Cauchy- 
Schwartz that also Jq goK\j/„ '^[^(/'e)]i|2yt "^ ^"d hence 

{[xifnKqe]{{l:y)i} + o{n^))'<o{n°) [[m)hk{dy)] 

and due to continuity of [v/„K"ge] , (ii)(b) follows. For (ii)(d) we proceed as in part (ii) =^ (i) 

i i i 

, L;3v, [/).;,■ |det^,-te|] , LD.;,^,/e a'dg^\detd,ie\ L,P,;,-3,,/e , ^^ ^^^ 

= ^«( \a^^\ + ^ ^'') = ''«( |det5..el +^y^^^^-''^ 

Idetdviel /e pe 

Now (ii)(c) follows from JB.22t and the fact that Va and all g^j are in L2{Pg) , and assertions ( I4.5t and 
BTel l from i [b:231 >. D 

The next corollary shows that K is uninformative for our problem in the sense that Pg -a.e. Ag = . 

Corollary B.3. Under the assumptions ofTheorein \4.4\ setting 

Ag --V + Y,Wi (B.24) 

with Wi from < |B.10t (for i = I ) and respectively defined otherwise, it holds that 

Ae = [4°'l (B.25) 

Proof : l lB.24b is defined according to l lB.23b on K'^ , and as l IB.lSt entails ^,w,- = [Pi ] , the assertion is 
a direct consequence of J e A^ <;=^ D{x) = <=> dglg{x)=0 ^=r V{x)=0. D 

Appendix B.3. Proofs of Sections 6 

For the proof of Proposition l6.1l we need two lemmas: 

Lemma B.4. The multivariate location model \T2\ is Li -dijferentiable iff it is "partially" in each coordi- 
nate separately, i.e.; 

j (yf{xu...,Xj + h,...,Xk)-^/f{x)(\-i^Af,j{x))fx''{dx)=o{h) (B.26) 

Proof to LemmalBH iGarel and Hailir] h995l. Lemma 2.1) D 



Lemma B.5. The multivariate location model UM is L2 -dijferentiable iff it is "partially" in each coordi- 
nate separately, i. e. ; for each i,j= 1 , . . . , fc and each A= A^ eR 

y'(v/to(Ii +/75;.yA) v7((t + HjA)x) - V7W(1 + ^Ai, W))^A*(dx) = o(A2) (B.27) 

where Sij is the matrix in R with but entries except at position i,j. 

Proof to Lemma IB31 With obvious translation we may parallel iGarel and HallinI (119951 Lemma 2.1). A 
proof is given in Ruckdeschel (2001. Lemma B.3.3). D 



Proof to Proposition |6T| Putting together Lemmas IB. 41 and [B. 51 we have reduced th e problem to the re- 
spective questions in the o ne dimens i onal l ocation resp. scale model, which is pro ven in Haj^J 19720 (one- 
dimen sional location) and lSwensenI ( Il98 (f. Ch.2, Sec. 3) (one-dimensional scale): iRuckdeschel and Rieden 
( 1201(1 Prop. 3.1) in addition shows that in the pure scale case, we may allow for mass in 0. D 
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Appendix B. 4. Proofs of Section 7 

Proof to Proposition l7.4l For fixed 7^ a £ IR'' , the proof goes through word by word as in lHuben ( 1198 ih . 
simply replacing // by a^dgf and fi by fi: By a monotone convergence argument it is shown that we 
may differentiate twice under the integral sign, giving 

A' J V /l /o J ff 

So we conclude that a^^glog/o =a^5glog/i X {dx) a.e., i.e., 

a''dele^oie{x)+a'de^oz\Ae.\.d_Mx)\=a'deie^ois{x)+a'de^og\AAd_,lg(x)\, 
Jo h 

where due to (d) up to a A* -null set -J^oig = -P-oiq , and hence up to a X^ -null set Vlog/o = Vlog/i . 

Integrating this out w.r.t x, , we get by (c) that /o(x) = c,(x_,)/i (x) for A' ' almost all x_/ . Varying i , 
we see that for some c > , c,(x_,) = c for all / = I,. ..,k and for X'' almost all x , and hence 

and c = 1 . As this holds for any 7^ a £ R^ , the assertion for .fg (F) follows. D 



Proof to Proposition 17.61 As by Proposition 17. II for any a^W the mapping F n> J^g{F;d) is weakly 
lower-semicontinuous, the same goes for the following, recursively defined mappings: Let ai e R'' , |ai | = 
1 realize 

Je-jiF) := JeiF) =miixJ^g{F;a), aeM.'',\a\ = l 

and for i = 2,. . . ,k, assuming cij already defined for j — 1 ,...,/— 1 , let a,- e R'' , | a,- 1 = 1 realize 

^e;i{F) :=max J^e(F;a), a e r'', \a\ = 1, fl± {aj}y<;. 

Then each of the ^g-j{F) , i = I,. .. ,k is weakly lower-semicontinuous by the same argument as J^g{F) 
and is strictly positive by assumption. Hence for each / = 1, . . . ,fc, the mapping F h^ l/^g.,(F) is weakly 
upper-semicontinuous, and so is the sum £,■ l/^e.,(F) . But this sum is just the trace of [^e(^)l ' • The 
corresponding statement as to the attainment of the maximum is shown just as Corollarv l7.3l D 
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